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The nuclear magnetic resonance (NMR) structure of a globular domain of residues 1071 to 1178 within the 
previously annotated nucleic acid-binding region (NAB) of severe acute respiratory syndrome coronavirus 
nonstructural protein 3 (nsp3) has been determined, and N- and C-terminally adjoining polypeptide segments 
of 37 and 25 residues, respectively, have been shown to form flexibly extended linkers to the preceding globular 
domain and to the following, as yet uncharacterized domain. This extension of the structural coverage of nsp3 
was obtained from NMR studies with an nsp3 construct comprising residues 1066 to 1181 [nsp3(1066-1181)] 
and the constructs nsp3(1066-1203) and nsp3(1035-1181). A search of the protein structure database indicates 
that the globular domain of the NAB represents a new fold, with a parallel four-strand B-sheet holding two 
a-helices of three and four turns that are oriented antiparallel to the B-strands. Two antiparallel two-strand 
f-sheets and two 3,,-helices are anchored against the surface of this barrel-like molecular core. Chemical shift 
changes upon the addition of single-stranded RNAs (ssRNAs) identified a group of residues that form a 
positively charged patch on the protein surface as the binding site responsible for the previously reported 
affinity for nucleic acids. This binding site is similar to the ssRNA-binding site of the sterile alpha motif domain 
of the Saccharomyces cerevisiae Vts1p protein, although the two proteins do not share a common globular fold. 


The coronavirus replication cycle begins with the translation 
of the 29-kb positive-strand genomic RNA to produce two 
large polyprotein species (ppla and pplab), which are subse- 
quently cleaved to produce 15 or possibly 16 nonstructural 
proteins (nsp’s) (11). Among these, nsp3 is the largest nsp and 
also the largest coronavirus protein. nsp3 is a glycosylated (16, 
22), multidomain (36, 51), integral membrane protein (38). All 
known coronaviruses encode a homologue of severe acute re- 
spiratory syndrome coronavirus (SARS-CoV) nsp3, and se- 
quence analysis suggests that at least some functions of nsp3 
may be found in all members of the order Nidovirales (11). 
Hallmarks of the coronavirus nsp3 proteins include one or two 
papain-like proteinase domains (3, 12, 16, 31, 56, 62), one to 
three histone H2A-like macrodomains which may bind RNA 
or RNA-like substrates (5, 9, 48, 54, 55), and a carboxyl- 
terminal Y domain of unknown function (13). An extensive 
bioinformatics analysis of the coronavirus replicase proteins by 
Snijder et al. (51) provided detailed annotations of the then- 
recently sequenced SARS-CoV genome (35, 47), including the 
identification of a domain unique to SARS-CoV and the pre- 
diction of the ADP-ribose-1”-phosphatase (ADRP) activity of 
the X domain (since shown to be one of the macrodomains). 
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Only limited information is so far available regarding the 
ways in which the functions of nsp3 are involved in the coro- 
navirus replication cycle. Some functions of nsp3 appear to be 
directed toward protein; e.g., the nsp3 proteinase domain 
cleaves the amino-terminal two or three nsp’s from the 
polyprotein and has deubiquitinating activity (4, 6, 14, 30, 53, 
60). Most homologues of the most conserved macrodomain of 
nsp3 appear to possess ADRP activity (9, 34, 41-43, 48, 59) and 
may act on protein-conjugated poly(ADP-ribose); however, 
this function appears to be dispensable for replication (10, 42) 
and may not be conserved in all coronaviruses (41). The po- 
tential involvement of nsp3 in RNA replication is suggested by 
the presence of several RNA-binding domains (5, 36, 49, 54, 
55). nsp3 has been identified in convoluted membrane struc- 
tures that are also associated with other replicase proteins and 
that have been shown to be involved in viral RNA synthesis 
(16, 24, 52), and nsp3 papain-like proteinase activity is essen- 
tial for replication (14, 62). Other conserved structural features 
of nsp3 include two ubiquitin-like domains (UB1 and UB2) 
(45, 49). We have also recently reported that nsp3 is a struc- 
tural protein, since it was identified as a minor component of 
purified SARS-CoV preparations, although it is not known 
whether nsp3 is directly involved in virogenesis or is inciden- 
tally incorporated due to protein-protein or protein-RNA in- 
teractions (36). 

A nucleic acid-binding region (NAB) is located within 
the polypeptide segment of residues 1035 to 1203 of nsp3. The 
NAB is expected to be located in the cytoplasm, along with the 
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papain-like protease, ADRP, a region unique to SARS-CoV 
(the SARS-CoV unique domain [SUD]), and nsp3a, since both 
the N and C termini of nsp3 were shown previously to be 
cytoplasmic (38). Two hydrophobic segments are membrane 
spanning (38), and the NAB is located roughly 200 residues in 
the N-terminal direction from the first membrane-spanning 
segment. This paper presents the next step in the structural 
coverage of nsp3, with the determination of the NAB structure. 
The structural studies included nuclear magnetic resonance 
(NMR) characterization of two constructs, an nsp3 construct 
comprising residues 1035 to 1181 [nsp3(1035-1181)] and 
nsp3(1066-1203), and complete NMR structure determination 
for the construct nsp3(1066-1181) (see Fig. 8). The structural 
data were then used as a platform from which to investigate the 
nature of the previously reported single-stranded RNA (ssRNA)- 
binding activity of the NAB (36). Since no three-dimensional 
(3D) structures for the corresponding domains in other group II 
coronaviruses are known and since the SARS-CoV NAB has only 
very-low-level sequence identity to other proteins, such data could 
not readily be derived from comparisons with structurally and 
functionally characterized homologues. 


MATERIALS AND METHODS 


Production of nsp3(1066-1203), nsp3(1066-1181), and nsp3(1035-1181). Ini- 
tially, the protein production core of the Consortium for Functional and Struc- 
tural Proteomics of the SARS Coronavirus cloned and expressed a construct 
encoding nsp3 residues 1066 to 1225 by using expression vector pMHI1F with a 
six-His tag and a pBAD derivative in Escherichia coli DLA1 cells. The 1D 1H 
NMR spectrum of the protein indicated the presence of a globular domain, as 
well as of flexibly disordered polypeptide segments (data not shown). Next, two 
new constructs, nsp3(1066—1203) and nsp3(1066-1181), were designed based on 
the disordered-region prediction by GlobPlot (29). Sequences encoding the two 
constructs were subcloned into pET-28b and pET-25b (Novagen), respectively, 
and the resulting plasmids were used to transform E. coli strain BL21-CodonPlus 
(DE3)-RIL (Stratagene). The expression of the uniformly '°C,'°N-labeled pro- 
teins was carried out by growing freshly transformed cells in M9 minimal medium 
containing 1 g/liter *"NH,Cl and 4 g/liter p-['*Cg]glucose as the sole nitrogen and 
carbon sources. Cell cultures were grown at 37°C with vigorous shaking to an 
optical density at 600 nm of 0.8 to 0.9. The temperature was then lowered to 
18°C, and after induction with 1 mM isopropyl-B-p-thiogalactopyranoside, the 
cell cultures were grown for 18 h. 

The cells producing nsp3(1066-1203) from the pET-28b vector were harvested 
by centrifugation, resuspended in extraction buffer (50 mM sodium phosphate at 
pH 7.5, 150 mM NaCl, 5 mM imidazole, 0.1% Triton X-100, and Complete 
protease inhibitor tablets [Roche]), and lysed by sonication. The cell debris was 
removed by centrifugation (20,000 x g for 20 min). For the first purification step, 
the soluble protein was loaded onto an Ni** affinity column (HisTrap; Amer- 
sham) equilibrated with 50 mM sodium phosphate buffer, pH 7.5, containing 150 
mM NaCl and 5 mM imidazole. The bound proteins were eluted with a 5 to 500 
mM imidazole gradient. Fractions containing nsp3(1066—1203) were pooled and 
concentrated to a volume of 2 ml using centrifugal ultrafiltration devices (Mil- 
lipore). The buffer was then exchanged by dilution with 8 ml of 50 mM sodium 
phosphate buffer, pH 7.5, containing 150 mM NaCl and subsequent concentra- 
tion to 2 ml. After three cycles, 20 wl of thrombin (Enzyme Research Labora- 
tories) was added and the reaction was monitored by gel electrophoresis. After 
5 h at room temperature, the sample was loaded onto a size exclusion column 
(Superdex 75; Amersham) equilibrated with 50 mM sodium phosphate buffer, 
pH 7.5, containing 150 mM NaCl and then eluted with the same buffer. The 
fractions containing nsp3(1066-1203) were again pooled and concentrated to a 
final volume of 550 yl for a final protein concentration of 1.4 mM. 

For the production of nsp3(1066-1181) from the pET-25b vector, cells were 
lysed as described for nsp3(1066-1203) except that the extraction buffer was 50 
mM sodium phosphate, pH 6.5, with 50 mM NaCl, 0.1% Triton X-100, and 
Complete protease inhibitor tablets. For the first purification step, the soluble 
protein was loaded onto an anion-exchange column (HiTrap Q FF; Amersham) 
equilibrated with 50 mM sodium phosphate buffer, pH 6.5, containing 50 mM 
NaCl. The proteins were eluted with a 50 to 1,000 mM NaCl gradient. Fractions 
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containing the protein were pooled and concentrated to a volume of 10 ml by 
using centrifugal ultrafiltration devices (Millipore). The sample was loaded onto 
a size exclusion column (Superdex 75; Amersham) equilibrated with 50 mM 
sodium phosphate buffer, pH 6.5, containing 50 mM NaCl and then eluted with 
the same buffer. The fractions containing nsp3(1066-1181) were again pooled 
and concentrated to a final volume of 500 wl. The solution was then supple- 
mented with 50 yl of D,O and 5.5 wl of 200 mM NaN; for a protein concen- 
tration in the NMR sample of 1.4 mM. 

Uniformly '°N-labeled nsp3(1035-1181) was produced using the same proto- 
col used for nsp3(1066-1181) to obtain an NMR sample containing 1.1 mM 
protein. 

NMR spectroscopy. NMR measurements were performed at 298 K with 
Avance 600, DRX 700, and Avance 800 spectrometers equipped with TXI HCN 
Z- or xyz-gradient probe heads (Bruker BioSpin, Billerica, MA). Proton chemical 
shifts were referenced to internal 3-(trimethylsilyl)-1-propanesulfonic acid so- 
dium salt (DSS). The 43C and °N chemical shifts were referenced indirectly to 
DSS by using the absolute frequency ratios (58). Sequence-specific resonance 
assignments for nsp3(1066-1181) were obtained as reported previously (50), and 
the same approach was used to assign the residues 1182 to 1203 in nsp3(1066— 
1203). In nsp3(1035-1181), the SN and ‘HN resonances of the residues 1036 to 
1066 were assigned as a group by comparison with nsp3(1066-1181). Steady-state 
'SN{'H}-nuclear Overhauser effects (NOEs) were measured using transverse 
relaxation optimized spectroscopy (TROSY)-based experiments (46, 61) on a 
Bruker Avance 600 spectrometer, with a saturation period of 3.0 s and a total 
interscan delay of 5.0 s. 

Determination of amide proton Pf. Amide proton protection factors (Pf) for 
nsp3(1066-1181) were determined using a 1.2 mM }N-labeled protein sample 
that was lyophilized from H,O solution and then redissolved in 99% DO. The 
decay of the signal intensity of the N-'H correlation peaks due to the amide 
proton chemical exchange with D,O was monitored by acquiring a series of 2D 
[1°N,'H]-heteronuclear single-quantum coherence ({'°N,'H]-HSQC) spectra at 
different times after preparation of the D,O solution, for a total period of 2 
weeks. These data were analyzed using the software CARA (23), and for each 
peak, the volume was plotted versus the reaction time. The exponential decay 
constants yielded preliminary values for the Pf, which were then corrected for 
primary structure effects as described by Bai et al. (2). 

Structure determination. The input for the structure calculation was obtained 
from a 3D °N-resolved ['H,H]-NOE spectroscopy ([{‘H,'H]-NOESY) spectrum 
and from two 3D '4C-resolved ['H,'H]-NOESY spectra recorded with the carrier 
frequency in the aliphatic and the aromatic regions, respectively. All three data 
sets were recorded with a mixing time of 60 ms. The software ATNOS/CANDID 
(17, 18) was used in combination with the torsion angle dynamics algorithm of 
CYANA (15). Seven cycles of automated NOE cross-peak identification with 
ATNOS (18), automated NOE assignment with CANDID (17), and structure 
calculation with CYANA were performed. In the second and subsequent cycles, 
the intermediate protein structure was used as an additional guide for the 
interpretation of the NOESY spectra. During the first six cycles, ambiguous 
distance restraints were used, and in the final cycle, only distance restraints that 
could be attributed to a single pair of hydrogen atoms were retained. The 20 
conformers with the lowest residual CYANA target function values obtained 
from the seventh ATNOS/CANDID/CYANA cycle were subjected to energy 
minimization in a water shell with the program OPALp (25, 33), using the 
AMBER force field (8). The program MOLMOL (26) was used to analyze the 
ensemble of 20 energy-minimized conformers. 

Structure validation. Analyses of the stereochemical qualities of the molecular 
models were accomplished using the Joint Center for Structural Genomics Val- 
idation Central Suite (http://www.jcsg.org). 

Study of the interaction of nsp3(1066-1181) with ssRNA. The interaction of 
nsp3(1066-1181) with unlabeled ssRNA1 (5'-AAAUACCUCUCAAAAAUAA 
CACCACACCAUAUACCACAU-3’) was evaluated by comparison of the 2D 
[°N,'H]-HSOC spectra of uniformly N-labeled nsp3(1066-1181) recorded at 
four protein/ssRNAI ratios, 3:1, 1:1, 1:2, and 1:3. As a control, a 2D ['9N,'H]- 
HSQC spectrum was obtained after addition of single-stranded 5’-CUUGUUC 
AUU-3’ in fourfold excess with respect to the protein concentration under 
otherwise identical conditions. 

Electrophoretic mobility shift assays (EMSAs). nsp3(1066-1181) was mixed 
with an ssRNA or single-stranded DNA substrate in an assay buffer containing 
50 mM NaCl and 50 mM sodium phosphate at pH 6.5. The following custom- 
synthesized RNA oligomers (Integrated DNA Technologies, Inc., San Diego, 
CA) were tested: randomized 20-mer DNA and RNA; the homopolymers Ajo, 
Co, and Uy9; 5'-CCCGAUACCC-3’, which contains the core GAUA sequence 
that was shown previously to bind to nsp3a (49); 5'-CUAAACGAAC-3’, which 
is the leader transcription-regulatory sequence (TRS) from the SARS-CoV ge- 
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nome [TRS(+)]; 5’°-GUUCGUUUAG-3’, which is the leader TRS from the 
SARS-CoV antigenome [TRS(—)]; the decamers 5'-GAGAGAGAGA-3’, 5'-G 
GAGGAGGAG-3’, and 5’-GGGAGGGAGG-3’; the GGGA repeat oligomers 
5'-GGGAGGGA-3' [(GGGA),] and 5'-GGGAGGGAGGGAGGGAGGG 
A-3' [((GGGA).]; and the G-positional decamers 5'-GGGAAAAAAA-3’, 5'-A 
AAGGGAAAA-3', and 5'-AAAAAAAGGG-3’. Each reaction mixture con- 
tained between 0 and 495 j.M protein and 10 wg of the RNA or DNA substrate, 
equivalent to 80 ~M 20-mer, 160 1M decamer, or 190 ~M octamer nucleic acids. 
Protein-nucleic acid mixtures were incubated for 1 h at 37°C and then analyzed 
by native electrophoresis on precast 6% acrylamide DNA retardation gels (In- 
vitrogen). Nucleic acid was detected using SYBR gold poststain (Invitrogen) and 
photographed using a UV light source equipped with a digital camera. SYBR 
gold was rinsed out, and protein was subsequently detected using SYPRO ruby 
poststain (Invitrogen). 

Protein structure accession number. The atomic coordinates of the bundle of 
20 conformers used to represent the solution structure of nsp3(1066-1181) have 
been deposited in the Protein Data Bank (PDB; http:/Avww.rcsb.org/pdb/) with 
the accession code 2k87. 


RESULTS AND DISCUSSION 


At the outset of this project, 1D 'H NMR spectroscopy 
analysis of the 160-residue construct nsp3(1066-1225) indi- 
cated the presence of both a globular domain and flexibly 
disordered polypeptide segments. Based on the prediction of 
disordered segments by GlobPlot (29), we prepared two new 
constructs comprising the residues 1066 to 1203 and 1066 to 
1181. Initial NMR data then showed that while nsp3(1066- 
1181) contained the entire globular domain, nsp3(1066-1203) 
also included a flexibly disordered C-terminal tail. nsp3(1066- 
1181), which also provided higher-quality NMR data, was 
therefore selected for complete NMR structure determination. 
To further investigate the linker region in the N-terminal di- 
rection from the globular domain, we also expressed and pu- 
rified uniformly '°N-labeled nsp3(1035-1181). 

NMR structure of nsp3(1066-1181). The NMR structure 
determination was based on the previously reported resonance 
assignments (50). Three 3D heteronuclear resolved ['H,'H]- 
NOESY spectra were recorded with a mixing time of 60 ms. 
NOESY peak picking, NOE assignment, and the structure 
calculation were carried out with the programs ATNOS, 
CANDID, and CYANA (see Materials and Methods for de- 
tails). The seventh cycle of the ATNOS/CANDID/CYANA 
calculation yielded 2,368 meaningful NOE upper distance lim- 
its. In the resulting structure, the average global root mean 
square deviation relative to the mean coordinates calculated 
for the backbone atoms of residues 1071 to 1178 in the energy- 
refined bundle of 20 conformers (Fig. 1a) was 0.44 + 0.10 A. 
Combined with the residual CYANA target function value of 
0.65 + 0.23 A? (Table 1), this is indicative of high-quality NMR 
structure determination. 

Overall, the NMR structure of nsp3(1066—-1181) includes 
eight B-strands, two a-helices, and two 3,,-helices (Fig. 1b), 
which are arranged in the sequential order B1-82-B83-a1-B4- 
B5-349-3409-B6-B7-a2-B8. The eight B-strands form two antipa- 
rallel B-sheets, the first containing 81 and 86 and the second 
containing B2 and B8, and a parallel half-barrel comprising B3, 
84, 85, and 87. Helices a1 and a2 are oriented antiparallel to 
strands B3 and £7, respectively (Fig. 1b). In the 3D fold, the 
two 3, -helices and the two short antiparallel B-sheets are 
anchored against the surface of the barrel-like molecular core 
formed by the four-strand B-sheet and the two a-helices. 

Along the polypeptide chain, the first regular secondary 
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FIG. 1. (a to c) Stereo views of the NMR structure of nsp3(1066— 
1181). (a) Bundle of 20 energy-minimized CYANA conformers. The 
polypeptide backbone is shown as a gray spline function through the C* 
positions. Selected sequence positions in the globular domain are indi- 
cated by numerals, where the numbers | to 116 correspond to nsp3 
residues 1066 to 1181. (b) Ribbon representation of the conformer in 
panel a that is closest to the mean coordinates. The regular secondary 
structure elements and the two chain ends at positions 6 and 112 are 
indicated. (c) All-heavy-atom presentation of the conformer in panel b. 
The backbone is represented by a gray spline function through the C* 
atoms, amino acid side chains with local displacements of <0.6 A are 
colored blue, those with local displacements of >0.6 A are red, and the 
three Trp residues are highlighted in green (see the text). (d) Represen- 
tation of the electrostatic potential surface showing the area that was 
found, by chemical shift perturbation experiments, to contain the residues 
involved in ssRNA binding. The locations of selected residues are 
identified. 
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TABLE 1. Input for the structure calculation and characterization 
of the bundle of 20 energy-minimized CYANA conformers that 
represent the NMR structure of nsp3(1066-1181) 


Value* 


Quantity 


No. of NOE upper distance limits «0.0... cee 2,368* 
Wnithar sidtialle cis ccsiydeccstreas eiseciccenyesencstiemniieedtienyiies 544* 


Short range...... 670* 
Medium range. 397* 
LONG TaN Ge sissesszccorssvesvies IT 
Dihedral angle constraint... és 118* 
Residual target function (A2) ......ccccccccssssseeeseeeseees 0.65 + 0.23* 
Residual NOE violations 
NG S014 os doiuninranniaeiasansoe 52 

Maximum (A) 0.35 + 0.01* 
Residual dihedral angle violations 

No. = 2.5°.... 3+1* 

Maximum (°) 71.3 + 2.74* 
AMBER energies (kcal/mol) 

TOtallis. cssisssssscasisssesestsoasssorssosssssossosasdsessevenesssancsorvanies —4,054.95 + 86.30 

WatlsGer’ Waals cs cciecseeistasesiecdecrssasiacsssentictigiecerctaes —342.05 + 11.19 

BG CHrOStALIGS carve secvetassssyetessts ti sreesiaatneeeeiea aes —4,259.94 + 85.38 


rmsd from ideal geometry 


Bond Vem th (A) yssescittassitecicti Set cesta iaseeceivastiss 0.0075 + 0.0001 


Bond atisle: (°).:cccsteesisisscsisescejsacesivascsiasessisevessannese 1.941 + 0.36 
rmsd relative to mean coordinates (A)’ 

DD: LOT1A=1178) i: ssccascccsssesesescsspssccssesieassesterscsesntseess 0.44 + 0.10 

Thar (TOPIAIT 7S) viiccceresiss Scaacssshestusssssssnesieasssdeotsasgnetse 0.85 + 0.15 
Ramachandran plot—residues* in 

Most favored regions ([%) ...sssessesesseeseeseeeeseeeees 78 

Additional allowed regions (%)....sscssseseeeees 19 

Generously allowed regions (%) . mes 3 

Disallowed regions (%) ...scscscssssesessesesssseeeeseeneees 0 


“ Entries marked with asterisks refer to the 20 CYANA conformers with the 
lowest residual target function values; the remaining entries refer to the same 
conformers after energy minimization with OPALp. The ranges indicate the 
minimum and maximum values. Where applicable, values are given as means + 
standard deviations. 

» bb indicates the backbone atoms N, C*, and C’; ha stands for all heavy atoms. 
The numbers in parentheses indicate the residues for which the rmsd was cal- 
culated. 

© As determined by PROCHECK (27). 


structure, B1, comprises residues 9 and 10 (where residue num- 
bers 1 to 116 correspond to nsp3 residues 1066 to 1181) and is 
connected via a well-defined 6-amino-acid linker to strand 82, 
containing residues 17 and 18. A 3-residue turn leads to B3, 
which spans residues 23 to 27 and is connected via a short turn 
to helix a1, with residues 30 to 40, and then a 7-residue loop 
affords the link to strand B4, with residues 48 to 54. Next, a 
type VIa turn (7) with cis-proline in position 56 connects to B5, 
with residues 62 to 66, which is oriented parallel to B4. Two 
3, -helices with resides 67 to 69 and 72 to 74 are part of the 
linker sequence to B6, which is formed by residues 78 and 79. 
A well-defined 6-residue loop leads to 87, with residues 85 to 
88, a further 6-residue linker leads to «2, with residues 95 to 
108, and the last regular secondary structure is 88, containing 
residues 110 to 112. 

Overall, the electrostatic potential surface of nsp3(1066- 
1181) shows a homogeneous charge distribution, with the sole 
exception that there is a positive patch constituted by the 
residues K75, K76, K99, and R106, which are located in the 
loop preceding strand B6 and helix a2 (Fig. 1d). 

Since a search of the PDB provided evidence that the NAB 
forms a new polypeptide fold, we performed additional NUR 
experiments to obtain independent support for this novel 
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structure. We measured the exchange rates of the amide pro- 
tons of the NAB with deuterons from the solvent by dissolving 
a sample of the lyophilized protein in D,O. The rates at which 
the amide proton signal intensities decrease provide informa- 
tion on the amount of protection of each proton by the protein 
secondary and tertiary structures, with greater protection and 
lower exchange rates reflecting involvement in the hydrogen 
bonds of regular secondary structures and/or sequestration 
from the solvent by the protein’s tertiary structure. This 
method is distinct from the NMR experiments that had been 
used to provide conformational constraints for the structure 
determination and thus provides an independent check on the 
compatibility of the 3D structure with experimental data. 
Amide proton Pf are defined as log(k,,/k,,.), where k,,, is the 
measured hydrogen/deuterium exchange rate constant and k,,, 
is the intrinsic hydrogen/deuterium exchange rate constant for 
the same residue type when completely exposed to the solvent 
water. The reference value k;,, is determined by the nature of 
the residue and its sequential neighbors (2). High Pf values 
corrected for the primary structure effects prevail for nearly all 
the backbone amide groups in the regular secondary structures 
(Fig. 2), thus providing independent support for the NOE- 
derived novel fold of nsp3(1066-1181). The residues with the 
highest Pf values are located in strands B5 and B7, which 
exhibit a dense hydrogen bonding network (Fig. 3) and are 
buried in the core of the protein. The expected pattern of Pf 
values for amide protons in CO;NH, + 3) hydrogen bonds of 
a-helices (where i represents the position of the residue of 
interest in the protein sequence) is clearly seen for helices al 
and «2, with outstandingly high Pf values for the residues M39 
and C107. The only apparent discrepancy from the NOE-based 
regular secondary structure determination was noted for the 
short B-sheet formed by strands B2 and 88 (Fig. 3), where no 
measurable exchange protection was seen. This B-sheet is sol- 
vent exposed near the protein surface (Fig. 1b), which is prob- 


Bl 6263 al £4 


BS 310 310 BO B7 a2 £8 
| 2 eae — ian 


Pf 


1 11 21 31 41 #51 61 71 81 91 101 111 
amino acid sequence 


FIG. 2. Histogram of the amide proton Pf versus the amino acid 
sequence of nsp3(1066-1181); the numbers 1 to 116 along the hori- 
zontal axis correspond to nsp3 residues 1066 to 1181. At the top, the 
positions of the regular secondary structure elements are indicated. Pf 
is defined as log(k;,,/k.,), where k,, is the measured hydrogen/deute- 
rium exchange rate constant and k;,, is the intrinsic hydrogen/deute- 
rium exchange rate constant for the same residue type when com- 
pletely exposed to the solvent water (2). Higher Pf values reflect lower 
amide proton exchange rates with the solvent, which result from in- 
volvement in hydrogen bonding networks and/or sequestration from 
the solvent by the protein’s tertiary structure. 
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FIG. 3. B-Sheet topology in nsp3(1066-1181), where blue lines represent hydrogen bonds that were identified by MOLMOL (26) in at least 10 
conformers out of the ensemble of 20 conformers depicted in Fig. 1. Interstrand ‘H-'H NOEs are indicated by double-headed black arrows. Amide 


groups of residues with Pf values of =2.0 are color coded in red. 


ably the reason for the implicated weak protection. The 
B-sheet topology in the NMR structure of nsp3(1066-1181) 
reveals that the residues with high Pf values are in most in- 
stances structurally constrained by larger numbers of NOEs 
than those with lower Pf values (Fig. 3). There are also a few 
residues in nonregular secondary structure regions that exhibit 
significant Pf. These are either hydrogen bonded or buried in 
the molecular core or both. An example is F42, with an amide 
group that interacts with the carbonyl group of N37 and the 
side chain hydroxyl group of T40. 

Similar information was obtained for the tryptophan indole 
protons. The two indole protons of W103 and W109, with 
o,(7°N) values near 129 ppm and with 'H chemical shifts of 
11.0 and 10.1 ppm, respectively (Fig. 4a and b), do not show 
measurable protection, whereas the indole proton of W87, at 
©('H) of 10.6 ppm, is observable for more than 12 days after 
the dissolution of the protein in D,O. This behavior correlates 
with the locations of these residues in the 3D structure, where 
W87 is located in the core of the protein and W103 and W109 
are solvent accessible near the protein surface (Fig. Ic). 

3D structure homology search using the PDB, SCOP, and 
CATH databases. A 3D structure homology search was per- 
formed with the software DALI (19, 20), using the conformer 
closest to the mean coordinates of nsp3(1066-1181) (Fig. 1b) 
as the input. The result implicated more than 300 homologues, 
all with DALI Z scores below 3.3 A. This outcome is due to a 
certain degree of similarity among parts of the polypeptide 
folds of the NAB globular domain (Fig. 1b) and proteins from 


the signal transduction protein CheY family, which has a large 
representation in the PDB. However, visual inspection showed 
that although there is some overlapping of regular secondary 
structure elements, the arrangements of the B-strands are 
characteristically different in each pairwise comparison with 
individual CheY proteins. Similar structure homology searches 
using the SCOP (32) and CATH (39) databases provided sim- 
ilar results, and no protein was identified in the three databases 
that would form a globular fold of the type seen for the SARS- 
CoV NAB. Although new folds were previously identified in 
other regions of the SARS-CoV proteome (see, for example, 
reference 1), we have here the first domain within nsp3 for 
which standard homology searches indicate that it exhibits a 
new fold. This is an intriguing finding in the context that pre- 
vious observations indicate a trend for 3D structure redun- 
dancy of nsp3 domains, as exemplified by the three macrodo- 
main-like folds of ADRP [nsp3(184-351)] and the N-terminal 
and middle regions of the SUD {SUD-N [nsp3(389-517)] and 
SUD-M [nsp3(527-651)]} and the ubiquitin-like folds of UB1 
[nsp3(1-112)] and UB2 [nsp3(723-792)] (5, 45, 48, 49, 55). 
Exploring the overall organization of the NAB domain 
within nsp3. To characterize the linker segments that flank the 
nsp3(1066-1181) globular domain, we studied the two con- 
structs nsp3(1035-1181) and nsp3(1066-1203). The additional 
31 residues in nsp3(1035-1181) correspond to the segment 
linking the globular domains of the papain-like protease 
[nsp3(723-1037)] and the NAB (Fig. 5). For nsp3(1066—-1203), 
the C-terminal extension by 22 residues was somewhat arbi- 
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FIG. 4. (a) Superposition of the 2D [°N,’H]-HSQC spectra of 
nsp3(1066-1181) (black) and nsp3(1035-1181) (green). (b) Superpo- 
sition of the 2D [°N,’H]-HSQC spectra of nsp3(1066-1181) (black) 
and nsp3(1066-1203) (red). (c) Plot of relative *N{*H}-NOE inten- 
sities, J,.), versus the amino acid sequence of the nsp3 fragment com- 
prising residues 1035 to 1203. Red squares represent the experimental 
measurements for the backbone amide groups in the construct 
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trary, due to the lack of information on the location of the 
nearest following domain. In the 2D ['°N,'H]-HSQC spectra 
of the two constructs (Fig. 4a and b), the peaks from residues 
1071 to 1177 maintain the same chemical shifts that were 
observed for nsp3(1066-1181), indicating that all three con- 
structs contain identical globular domains. Nearly all the peaks 
of the polypeptide segments of residues 1035 to 1070 and 1178 
to 1203 are in the random-coil chemical shift region, with ‘H 
chemical shifts between 7.5 and 8.5 ppm. The increased inten- 
sities of the resonances from the two tails, compared with the 
peaks from the globular domain, are indicative of flexibly dis- 
ordered polypeptide segments. Increased mobility of both tail 
regions was confirmed by '"N{'H}-NOE experiments (Fig. 4c). 
The residues 1071 to 1178, for which the motion of the back- 
bone °N-'H moieties is essentially restricted to the overall 
tumbling of the molecule, have positive NOE values of about 
0.8, whereas the residues 1035 to 1065 and 1180 to 1203 have 
values in the range of —0.4 to 0.4, indicating increased dynam- 
ics on the subnanosecond time scale for these polypeptide 
segments. 

Overall, nsp3 is characterized by the arrangement of small 
globular domains linked by flexibly disordered polypeptide seg- 
ments (Fig. 5). This domain distribution may play a functional 
role by governing substrate accessibility and protein-protein 
interactions, which could then result in spatial proximity of 
multiple activities of proteases, deubiquitination factors, and 
nucleic acid-binding domains (45, 48, 49). Possible interactions 
between nsp3 and other SARS-CoV proteins have been inves- 
tigated using binding studies with nsp3 fragments of various 
lengths (21, 40, 57). Considering those fragments that contain 
the NAB, Imbert et al. (21) reported that nsp3(1033-1418) 
interacts with multiple other SARS-CoV nonstructural pro- 
teins, including nsp5, nsp12, nsp13, nsp14, nsp15, nsp16, and 
other nsp3 domains; von Brunn et al. (57) reported interac- 
tions of nsp3(722-1921) with nsp2, ORF3a, and ORF9b; Pan 
et al. (40) reported interactions of nsp3(726-1438) with nsp4 
and nsp12. These results indicate that nsp3 may have multiple 
functional partners within the SARS-CoV proteome. The now 
available structural coverage of nsp3 (Fig. 5) provides a basis 
for the design of additional interaction studies that could yield 
specific information about individual nsp3 domains. 

The combination of flexibly disordered regions and struc- 
tured binding motifs, such as the Staufen double-stranded 
RNA-binding domains and the RNA recognition domains of 
the polypyrimidine tract-binding protein, is a common feature 
in RNA interaction proteins. Modular organization can in- 
crease the specificity and affinity of binding compared with 
those of the individual domains, for example, by allowing si- 
multaneous interactions with different segments of an RNA 
sequence. Within nsp3, potential partners of the NAB for such 


nsp3(1066-1203), and green squares represent those for nsp3(1035- 
1181). The broken vertical line is used to indicate that the NMR signals 
of the residues 1035 to 1065 were assigned as a group (see Materials 
and Methods), and the data points to the left of this line have arbi- 
trarily been arranged in the order of decreasing J,., values. The 
'SN{'H}-NOEs were recorded at a 'H frequency of 600 MHz by using 
a saturation period of 3.0 s and a total interscan delay of 5.0 s (46, 61). 
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FIG. 5. Structural organization of the nsp3 fragment containing residues 1 to 1318. The solid line at the top indicates the initial domain 
annotation based on bioinformatics and phylogenetic analyses. The dashed line to the right represents the C-terminal segment of residues 1318 
to 1922 of nsp3. Below, the presently known structural coverage with globular domains and flexibly disordered linker segments is shown. Ribbon 
representations are used for the globular domains. Flexibly disordered regions revealed by NMR spectroscopy and disordered segments implicated 
by X-ray crystallography are shown as blue and green lines, respectively. Gray lines and rectangles represent polypeptide segments with so far 


unknown structures. SUD-C, C-terminal region of SUD. 


concerted action include UB1, which exhibits a ubiquitin-like 
fold with nucleic acid-binding activity (49), and SUD-N and 
SUD-M, which adopt macrodomain folds and bind RNA 
(5, 55). 

Mapping of ssRNA-binding sites. The NAB has been shown 
to interact with nucleic acids (36). Here, we used binding 
experiments with exogenous ssRNA to investigate the loca- 
tions of nucleic acid-binding sites. Chemical shift perturbation 
studies were performed using uniformly '°N-labeled nsp3(1066- 
1181) and unlabeled ssRNA1, which has the sequence 5’-AA 
AUACCUCUCAAAAAUAACACCACACCAUAUACCAC 
AU-3'. This RNA was chosen as a follow-up to our earlier 
study, in which EMSAs showed binding of the NAB to this 
sequence (36). The oligonucleotide was designed to contain 
the sequence AVA, which copurified with nsp3a during expres- 
sion in E. coli (49), and to be single stranded with minimal 
secondary structure. 

Upon the addition of a threefold excess of the ssRNA, a 
small number of the residues show highly selective chemical 
shift perturbations (Fig. 6). These residues are located in a 
positively charged protein surface patch defined by the resi- 
dues K1140, K1141, K1164, and R1171 (K75, K76, K99, and 
R106 in the construct numeration in Fig. 1d; see also Fig. 8). 
The nearby residues N1082, A1083, $1084, D1131, H1134, and 
T1162 are also affected by the presence of ssRNA. It is worth 
mentioning that although the NAB exhibits a new fold with no 
apparent similarity to other RNA-binding proteins, a detailed 
comparison of the nucleic acid-binding site thus identified with 
those of other RNA-binding polypeptides revealed significant 
homologies. These RNA interaction sites typically consist of a 
surface patch of positively charged and aromatic residues, 
some of which are also neighbors in the sequence (28, 36, 37, 
44). The NAB shows particularly close similarity to the RNA- 
binding site of the sterile alpha motif (SAM) of the Saccharo- 
myces cerevisiae Vtslp protein, which exhibits high affinity for 
ssRNA with the sequence CNGGN, where N can be any of the 
four ribonucleotides. Vtslp is an a-helical protein in which the 
residues involved in binding to ssRNA are located in a loop 
between helices «2 and a3 (**RLHKY**) and in the first two 
turns of helix a5 (*°°LGARK*”’). In the NAB, most of the 
interacting residues are located in the linker segment compris- 
ing the two 3,,-helices and at the start of helix a2. In addition 


to the common arrangement of a patch of positively charged 
residues in the two proteins, conservation of other residues 
exists. For example, Y1132 and H1134 in the SARS-CoV NAB 
(Y67 and H69 in Fig. 1d) correspond to Y468 and H466 in 
Vtslp (Fig. 7). 

Investigation of SARS-CoV NAB homologues with the use 


01 (!5N) 
[ppm] 
105 


FIG. 6. Studies of RNA binding using chemical shift perturbation 
experiments. Panel a shows the 2D ['N,'H]-HSQC spectra of 
nsp3(1066-1181) in the absence (red) and presence (blue) of a three- 
fold excess of ssRNAI (see the text for the sequence). Residues with 
chemical shift changes are indicated, and some are also shown in 
panels b to d in expanded plots of the superimposed 2D [°N,'H]- 
HSQC spectra. 
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Chatterjee, et al., unpublished data); thus, in addition to char- 
acterizing binding to the NAB, the experiments enabled us to 
conduct comparisons of RNA binding to the NAB with that to 
the other nsp3 domains. Figure 9 shows that nsp3e binds 
strongly to A- and G-containing RNAs, especially (GGGA),, 
(GGGA),, GGGAGGGAGG, GGAGGAGGAG, AAAAAA 
AGGG, and AAAGGGAAAA. The protein has weak affinity 
for random RNA sequences and minimal affinity for random 
DNA sequences. 

nsp3e appears to bind most strongly to the sequences con- 
taining repeats of three consecutive guanosines; for example, 

FIG. 7. Ribbon representations of residues 1071 to 1177 in the in Fig. 9, middle right panels, a majority of the (GGGA), and 
NMR structure of nsp3(1066-1181) (left) and residues 445 to 517 in GGGAGGGAGG RNAs are bound to the protein even at an 


the SAM of Vstlp (right). In nsp3(1066-1181), the side chains of the ‘ . * : : 
residues with significant chemical shift perturbations (Fig. 6) are indi- appramarely 2) RN Np td, SORaOAGGAG 4s 


cated. For the Vtslp SAM (PDB code 2ese), the residues located in also bound, but not quite as strongly. No evidence was seen for 
the ssRNA interaction site described by Oberstrass et al. (37) are binding to Ayo, Uy, or Cy, to TRS(+) or its reverse comple- 
indicated. ment, TRS(—), or to RNA oligomer 5'-CCCGAUACCC-3’, 
which contains the GAUA sequence that is recognized by the 
N-terminal domain of nsp3, nsp3a. 
of BLAST searches revealed four relatively distant protein An earlier NMR study demonstrated that the macrodomain 
clusters corresponding to the four major group II coronavirus — which forms SUD-M binds to Aj, but has very little affinity for 
lineages (Fig. 8). Comparison of the species listed in Fig. 8 Uo. EMSAs showed weak binding of SUD-M to A,;, 
reveals significant sequence conservation in the segments cor- | (ACUG);, and TRS(—). No affinity for TRS(+) or for the 
responding to the B-strands of SARS-CoV nsp3(1066-1181), nsp3a oligomer was observable (5). Recent work in our labo- 
indicating that at least some features of this structure might be ratory (Johnson et al., unpublished) showed that SUD-M and 
present also in the homologous proteins. One observes further the peptide comprising SUD-M and the SUD C-terminal re- 
that there is pronounced divergence in the polypeptide regions gion are both purine RNA-binding proteins and also have 


implicated in RNA binding, suggesting that conservation on affinities for a range of G- and A-containing RNA sequences. 
the level of the 3D structure would not necessarily go along The peptide comprising SUD-N and SUD-M has been shown 
with conservation of the physiological function. to bind guanosines (54, 55). Thus, overall, the binding behavior 


RNA-binding specificity. The results described above al- of nsp3e appears to be similar to that of at least two other 
lowed us to delineate the RNA-binding site on nsp3e. We regions of nsp3, namely, the N-terminal nsp3a domain and the 
followed this up by carrying out additional EMSA experiments SUD. This may indicate functional linkage among the domains 
to test the range of RNA sequences to which nsp3e might bind. of nsp3. 

We selected a group of RNAs that had previously also been In conclusion, the present paper extends the structural and 
studied with other nsp3 domains (5, 36; Johnson, M. A., A. functional coverage of the SARS-CoV nsp3 to the N-terminal 
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FIG. 8. Sequence alignment of the polypeptide segment SRT that forms the globular domain of the SARS-CoV NAB with 
homologues from other group II coronaviruses. Protein multiple-sequence alignment was performed using ClustalW2 and included sequences from 
SARS-CoV Tor2 (accession no. AAP41036) and representatives of three protein clusters corresponding to three group II coronavirus lineages 
identified by a BLAST search: bat coronavirus HKUS5-5 (BtCoV-HKUS-S; accession no. ABN10901), BtCoV-HKU9-1 (accession no. POC6T6), 
and human coronavirus HKU1-N16 (HCoV-HKU1-N16; accession no. ABD75496). Above the sequences, the positions in full-length SARS-CoV 
nsp3, the locations of the regular secondary structures in the presently solved NMR structure of the SARS-CoV NAB globular domain, and the 
residue numbering in this domain are indicated. Amino acids are colored according to conservation and biochemical properties, following 
ClustalW conventions. Residues implicated in interactions with ssRNA are marked with inverted black triangles. In the present context, the key 
features are that there is only one position with conservation of K or R (red) and that there are extended sequences with conservation of 
hydrophobic residues (blue) (see the text). 
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FIG. 9. Results from EMSA experiments demonstrating nucleic acid binding to nsp3e. In each column, pairs of gel photographs are shown: on 
the left, the gels are stained for RNA, and on the right, the same gels are stained for protein. The nucleic acid sequence and the protein 
concentrations, ranging from 0 to 495 j.M, are given above each gel. To the left, the position of the protein is indicated by an open triangle, that 
of the RNA is indicated by a closed triangle, and the positions of protein-RNA complexes are indicated by open rectangles. The nucleic acid 
concentrations used were 80 ».M for 20-mers, 160 1M for decamers, and 190 1M for octamers. dN and rN49, randomized 20-mer DNA and RNA. 


part of the initially annotated nsp3e domain. Figure 5 illus- 
trates that this extension with flexible linkers and a globular 
domain is in line with the “string-of-pearls” appearance of the 
preceding nsp3 polypeptide region. In addition to performing 
the global structural characterization, we identified an ssRNA- 
binding site on the surface of the globular domain nsp3(1066- 
1181). The overall flexible arrangement of the globular do- 
mains along the nsp3 polypeptide chain (Fig. 5) indicates the 
possibility of concerted actions by multiple functionalities rep- 
resented by different regions of the polypeptide chain, includ- 
ing ssRNA binding by the nsp3(1066-1181) globular domain. 
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